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We discuss our new implementation of the Real-space Electronic Structure method for studying 
the atomic and electronic structure of infinite periodic as well as finite systems, based on density 
functional theory. This improved version which we call HARES (for High-performance-fortran 
Adaptive grid Real-space Electronic Structure) aims at making the method widely applicable and 
efficient, using high performance Fortran on parallel architectures. The scaling of various parts of 
a HARES calculation is analyzed and compared to that of plane- wave based methods. The new 
developments that lead to enhanced performance, and their parallel implementation, are presented 
in detail. We illustrate the application of HARES to the study of elemental crystalline solids, 
molecules and complex crystalline materials, such as blue bronze and zeolites. 



I. INTRODUCTION 

Technological advances are pushing the size of device 
components and the demands on their performance to 
ever smaller sizes and higher standards. These trends, 
which are only expected to accelerate in the future, make 
it imperative that the structure and behavior of systems 
at the atomistic level are thoroughly understood from a 
fundamental perspective. While experimental methods 
are steadily improving in their ability to probe atom- 
istic processes of materials, computational approaches 
provide a complementary technique for systematic and 
well-controlled studies. Tuning and optimization of the 
properties of materials, taking into account their chemi- 
cal composition, require computational methods that are 
of general applicability, unbiased and accurate. 

First-principles methods based on density functional 
theory (DFT)El have proven to be an accurate and reli- 
able tool in understanding and predicting a wide range of 
physical properties of finite (such as molecules and clus- 
ters) and extended structures (such as bulk crystalline 
solids, their defects and surfaces). Such methods must 
obtain the quantum mechanical ground state of the in- 
teracting electrons and ions, which makes them computa- 
tionally very intensive. The computational cost scales as 
a rather high power (typically 3) of the number of atoms 
or electrons in the system, which limits the sizes that 
can be investigated in a reasonable time. Improvement 
of the efficiency of first-principles calculations is therefore 
an important goal. This goal can be reached either by 
improving the algorithms to obtain better scaling with 
system size, or by exploiting modern computational re- 
sources and in particular parallel architectures. 

The scaling of the calculation of the quantum mechan- 
ical ground state of interacting electrons and ions can be 
improved by exploiting what W. Kohn has called "the 
short-sightedness" of quantum mechanics: due to screen- 
ing, interactions are essentially short ranged. A natural 
way to express this property is through the density ma- 



trix of the system. Thus, implementations of linear scal- 
ing (referred to as O(N) methods, with N a number rep- 
resenting the system size, like the number of electrons) 
typically involve density-matrix expressions with local- 
ized orbitalscl. In terms of the computational time, 0(N) 
methods prove advantageous for insulating systems with 
N > I0 3 or metallic systems with N > I0 4 electrons 
(the number of atoms in the system is typically an or- 
der of magnitude smaller than the number of electrons). 
This difference in efficiency of the O(N) methods be- 
tween insulators and metals is related to the long-range 
behavior of the density matrix which has a different fall 
off, i.e. exponential in the former vs. power law in the 
latter. These methods are well suited for the calculation 
of the total energy of the system, which provides useful 
information about its optimal structure, dynamics, and 
response to mechanical loading. In addition to the total 
energy, it is often important to study the electronic struc- 
ture of the system. This is necessary for understanding 
electronic, optical and magnetic properties, and is rele- 
vant for the study of both insulators (semiconductors) 
and metals. A DFT electronic structure calculation re- 
quires the calculation of the eigenvalue spectrum of the 
single-particle Hamiltonian, a problem which is not easily 
amenable to improvements in scaling since diagonaliza- 
tion of the Hamiltonian typically scales as N 3 . Localiza- 
tion of the electronic orbitals can be detrimental to the 
accuracy of electronic properties though it helps improve 
the efficiency of O(N) methods. 

In DFT calculations, the single-particle Hamiltonian 
matrix itself depends on the eigenfunctions, so the com- 
plete solution must be obtained by iterating the solution 
to self-consistency. The size of the Hamiltonian matrix 
depends on the basis set used to represent the electronic 
wavefunctions and the electronic charge density. Ideally, 
one would like to work with a sparse Hamiltonian, which 
can be solved efficiently using iterative algorithms; the 
use such algorithms reduces both computer memory and 
time requirements. A natural way to generate a sparse 



2 



Hamiltonian is to use a real space grid for the representa- 
tion of the electronic eigenfunctions and charge density: 
each term in the Hamiltonian, evaluated at some point 
in space, acts only on the wavefunctions at the same 
point in space, except for the Laplacian in the kinetic 
energy operator which involves several points simultane- 
ously The number of points included in the evaluation 
of the Laplacian determines the few off-diagonal non-zero 
matrix elements in each row (or column) of the Hamil- 
tonian matrix. The calculation can be made even more 
efficient by using an adaptive grid in real space for rep- 
resentation of the eigenfunctions, with points distributed 
according to the electronegativity of ions. The adaptive 
grid can be mapped onto a regular grid in curvilinear 
space through the proper definition of a metric. The reg- 
ular grid in curvilinear space makes it possible to exploit 
fully the capabilities of modern computational platforms 
based on parallel processing. Thus, this formulation of 
the problem satisfies all requirements for very efficient 
electronic structure calculations: (a) Sparsity for fast it- 
erative diagonalization of the Hamiltonian; (b) Adaptabil- 
ity for efficient distribution of grid points as demanded 
by the physical system; and (c) Efficient parallelization 
of the computation due to the natural distribution of the 
regular curvilinear space grid onto the processor grid. 

Our original implementation of such a methocH demon- 
strated the feasibility of performing calculations within 
this framework. The data structures and operations in- 
volved in this method make it easily parallelizable, par- 
ticularly using high performance Fortran (HPF). In the 
present paper we discuss several algorithmic issues that 
enhance the performance of the method and their imple- 
mentation using HPF; we refer to the new implementa- 
tion as HARES for HPF- Adaptive-grid Real-space Elec- 
tronic Structure. The paper is organized as follows: In 
section II we briefly review the theory underlying the 
HARES method and present an analysis of the compu- 
tational effort involved in the various parts of the calcu- 
lation. In section III we discuss the recent algorithmic 
enhancements and their implementation. In section IV 



we illustrate the efficacy of these algorithmic enhance- 
ments through several applications of HARES to inter- 
esting systems. These include: (a) a few simple elemental 
crystals and a few molecules composed of atoms in the 
first row of the periodic table which typically present a 
computational challenge to plane-wave (PW) methods; 

(b) blue bronze, a quasi one-dimensional conductor; and 

(c) a zeolite, that is, a complex structure composed of Si- 
O tetrahedra and large pores, which represents a molec- 
ular sieve. Section |v| contains our conclusions. 



II. THEORETICAL FRAMEWORK 

A. Density Functional Theory 

The problem of finding the quantum mechanical 
ground state of electrons in solids is a many body prob- 



lem which, at present, can be solved only approximately. 
The computational framework of choice for a wide range 
of problems involving a system of ions and interacting 
electrons is DFTEl. The central theorem of DFT, proven 
by Hohenberg and Kohn, states that the ground state 
energy of an electronic system is a unique functional of 
its charge density p(r) and is an extremum (a minimum) 
with respect to variations in the charge density. Kohn 
and ShamEJ expressed the charge density in terms of single 
particle wavefunctions ip a (j) (referred to as Kohn-Sham 
orbitals) and occupation numbers f a 

p(r)=J2f a \Mr)\ 2 - 

a 

The ground state energy functional is then given by 
1 
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where V^xt (r) is the external potential experienced by the 
electrons due to the presence of the ions, E H is the elec- 
trostatic (also known as Hartree) energy due to Coulomb 
repulsion of electrons and E xc is the exchange-correlation 
(XC) contribution, which embodies the many-body prop- 
erties of the interacting electron system. A variational ar- 
gument in terms of the single-particle states ipa ( r ) leads 
to a set of single-particle equations for fictitious non- 
interacting particles that produce the same density as 
the real electrons: 



-iv 2 + y off (p(r),r) 



i> a (r) = ea^a(r)- 



(2) 



The effective potential V e s in these single-particle equa- 
tions is: 

V eS (p(r),r) = Kxt(r) + V H [p(r)} + V m \p(r)] (3) 

where V H is the electrostatic potential due to Coulomb 
repulsion between electrons (known as the Hartree poten- 
tial) and V xc = 6E XC /Sp(r) is the exchange correlation 
potential. The system of Eqs. (g), referred to as Kohn- 
Sham equations, represents a set of nonlinear coupled 
equations due to the dependence of V H and V xc on the 
density (and hence the wave functions if) a ); these equa- 
tions are solved iteratively, beginning with a guess for the 
V'a's, until self-consistency is achieved. 

The only significant approximation in this set of equa- 
tions is the form of E xc [p(r)], which is not analytically 
known. The standard choices involve expressions that de- 
pend locally on p(r) (known as the Local Density Approx- 
imation — LDA), or involve both p(r) and its gradients 
(known as the Generalized Gradient Approximation — 
GGA). Such expressions have been derived from analyz- 
ing the behavior of the uniform or non-uniform electron 
gas in certain limits, or by fitting the results of accu- 
rate calculations based on quantum Monte Carlo tech- 
niques for sampling the many-body wavefunction; they 
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work well in reproducing the energetics of a wide vari- 
ety of ground state structures of extended (crystalline) 
or finite (cluster or molecular) systems. We discuss next 
the basic features of the LDA and GGA approaches and 
their limitations and capabilities. 

In the LDA, the exchange-correlation functional is ex- 
pressed as: 

Kc A [p(r)} = J P(r) £ ° c (p(r))d 3 r. (4) 

when the number of spin-up and spin-down states in the 
system are equal (we refer to this as the "spin compen- 
sated" case). In this expression e^cG ) * s ^ ne exchange 
correlation energy of the uniform electron gas of density 
p, which can be obtained by more elaborate computa- 
tional methods like quantum Monte Carlo, or even ana- 
lytically in certain limits. The ground state of a number 
of physical systems, such as atoms, molecules, and mag- 
netic crystals can exhibit nonzero spin-polarization. In 
the above description of DFT, the single particle states 
label a can be extended to include the spin (up or down) 
quantum number, and the energy functional can be read- 
ily generalized to take into account the spin-polarized 
electrons. The spin-polarized version of the exchange- 
correlation energy functional in the so called Local Spin 
Density Approximation (LSDA) is written as 

E^[p,(r), Pl (r)} = J p(r)4 c (PT(r),Pi(r))d 3 r, (5) 

where p j (r) and pi (r) are the electronic densities of spin- 
up and spin-down electrons, in terms of which the total 
electronic density p{v) is: 

p(r) = p T (r) +pi(r). 

Among the various proposed XC functionals e^ c (p), 
in our approach to the DFT method we have imple- 
mented two different aarametrizations of the Ceperley- 
Alder (CA) functionalta for both spin-polarized and spirt, 
compensated systems: the first is the Perdew-ZungerEl 
pararnetrization (PZ-CA) and the second is the Perdew- 
WangQ parametrization (PW-CA). The PW-CA func- 
tional uses a more accurate spin interpolation formula far 
the correlation, proposed by Vosko, Wilk, and NusairQ, 
which is based on the random-phase approximation; 
the PZ-CA functional uses the von Barth-Hcdirn spin- 
dependence for the correlation which is correct for the 
exchange part of the functional. 

Although the L(S)DA has proved successful in a variety 
of chemical and physical applications, it suffers certain 
well known deficiencies. Among those, the most serious 
are: 

(i) The tendency to produce more bonding in solids 
than is observed experimentally; manifestations of 
this tendency include the underestimate of the lat- 
tice constant or bond length and the overestimate 
of the cohesive energy and the bulk modulus. 



(ii) Poor representation of activation energies which are 
related to chemical reactions or transitions between 
structures. 

(iii) Incorrect relative stability of different magnetic 
phases for some magnetic materials. 

In order to correct these deficiencies, expressions for the 
XC functional which go beyond the density and include 
gradients of the density have been devised. In the so- 
called generalized gradient approximation (GGA), the 
XC energy functional is expressed as follows: 

^c GA [PT(r),Pl(r)] = 
J p(r) e - A (p T (r),Pi(r),V /3T (r),V Pi (r))d 3 r. (6) 

Applications of GGA to real materials show a tendency 
to over-correct the deficiencies of L(S)DA. For instance, 
the lattice constants of common crystalline solids tend 
to be overestimated within GjGA calculations, while bulk 
moduli are underestimatedeHI9. We have implemented 
in HARES the recently developed paxameter-free GGA 
functionals (PW9lOO and PBE96BEJiil). Present ca- 
pabilities include the use of the GGA functional in two 
different modes : (a) fully self-consistent GGA calcula- 
tions and (b) a posteriori correction of the total energy 
with the perturbative GGA XC correction applied at the 
end of a self-consistent LDA calculation as: 

A£ total = E^ [p^] - E£ A [p LDA ] . (7) 

It is important to point out that the core-valence XC in- 
teraction is significantly different between LDA and GGA 
as was noted by Fuchs et aZJlj Therefore, in order to en- 
sure the reliability of the GGA results, it is necessary 
to perform either the fully self-consistent GGA calcula- 
tion using GGA-constructed pseudopotentials (referred 
to as mode (a) above) or the a posteriori GGA correc- 
tion after the self-consistent LDA calculation using the 
LDA-constructed pseudopotentials with the partial-core 
electron density (referred to as mode (b) above). 

For the spin-polarized systems, we consider two dif- 
ferent modes of the computation: (1) The conventional 
unconstrained calculation, where the total electron den- 
sity and the magnetic moment are determined simulta- 
neously and self-consiste^tly; and (2) the fixed spin mo- 
ment (FSM) methocOlIZHla, which constrains the mag- 
netic moment to be constant, but allows the possibility 
of different Fermi energies for the spin-up and the spin- 
down electron densities. The latter method has certain 
advantages: A series of FSM calculations with different 
magnetic moments provide the total energy as a func- 
tion of the magnetic moment, yielding detailed informa- 
tion about the magnetic phase. In addition, the FSM 
calculations rapidly achieve self-consistency and are nu- 
merically more stable compared with the unconstrained 
calculations. 



4 



B. Computational Approach 

There exist a variety of methods for solving the set 
of single-particle equations derived from DFT, Eqs. (||). 
In the broadest classification, these methods fall into 
two categories, depending on how they describe the 
single-particle wavefunctions and charge density: Meth- 
ods in the first category use explicit basis sets to 
represent the wavefunctions and charge density, while 
those in the second category use finite, discrete grids 
(or meshes\— .of .-points, on which these functions are 
represented. Ej-E^EiHtElo A standard approach of the 
first type employs a PW. .basis, which is a natural ba- 
sis for periodic systems.Otlj The plane waves needed in 
the expansion are determined by the reciprocal lattice of 
the crystal while the number of plane waves included in 
the basis is determined by the highest kinetic energy, a 
parameter referred to as the "energy cutoff" . HARES 
falls in the second category of methods, as it employs a 
discrete mesh for the calculation. An important differ- 
ence in the two types of methods is that in the former 
all operators have a unique representation once the basis 
set is chosen, whereas in the latter operators involving 
differentials have many possible representations with dif- 
ferent order of approximation. In this sense, the latter 
type of methods involve an additional degree of approx- 
imation. Both types of methods map the Kohn-Sham 
problem onto a matrix eigenvalue problem, denoted by 
H KS . One of the desirable features of grid-based methods 
is to produce a sparse matrix H KS ; this makes it possible 
to employ iterative algorithms for its solution. 



1. Adaptive Coordinate Transformation 

HARES uses a uniform grid in curvilinear space which 
is analytically mapped onto a grid in real space with reso- 
lution (grid-spacing) adapted to natural inhomogeneities 
in the problem. With the use of the adaptive grid, 
one can use HARES for both all-electron and pseudopo- 
tential calculations. However, use of pscudopotcntials 
proves effective in most practical calculations. In the 
Kohn-Sham problem, inhomogeneities arise fundamen- 
tally from 14xt(r), which is the potential that each elec- 
tron experiences due to the presence of the nuclei. The 
Cartesian coordinates x z (£ a ;P m ) depend on the curvi- 
linear coordinates £ Q and a set of parameters P m that 
allows tuning of the coordinate representation to a par- 
ticular physical problem. The Jacobian of the transfor- 
mation is 

J^;P)=dx i /de (8) 

and describes how derivatives transform between the co- 
ordinate systems; its determinant \J\ = det J l a is a mea- 
sure of how the volume element is changed by the coor- 
dinate transformation. The metric giving the elemental 
length associated with infinitesimal displacement is given 



by 

<f* = (J-^iJ- 1 )?. (9) 

Details of the coordinate transformation can be found in 
Ref. |. 

The gist of the transformation is to enhance spatial res- 
olution in the region where it is desirable to increase the 
accuracy of the finite-difference derivatives and the repre- 
sentation of charge density inhomogeneities. The equiv- 
alent enhancement of resolution in the PW approach is 
the increase of the energy cutoff. The connection between 
the effective energy cutoff and the local resolution of the 
HARES grid is given by the factor | J|~ 2 / 3 . The differen- 
tial equation of the Kohn-Sham problem in the adaptive 
grid representation becomes a finite matrix eigenvalue 
problem, with only the kinetic energy term (the Lapla- 
cian in the single-particle equations) having off-diagonal 
elements. 

The uniform mesh in £— coordinates is subsequently 
broken into blocks that are distributed over a number 
of processors on a parallel computer architecture. The 
wavefunctions, potentials, and charge density are repre- 
sented on this mesh allowing for balanced distribution 
on processors. In the iterative solution of the eigenvalue 
problem, an operation that is performed frequently dur- 
ing the calculation is the product of the Hamiltonian 
matrix with a vector representing a single-particle wave- 
function. In parallel execution of this operation, it is 
the kinetic energy term (the Laplacian) with off-diagonal 
elements that requires most of the communication and 
makes the solution of the eigenvalue problem nontrivial. 

2. Boundary Conditions 

In DFT calculations based on a real-space grid, bound- 
ary conditions enter in the way the Laplacian is applied to 
a function. There are only two aspects of a calculation 
where this is relevant: (i) the kinetic energy operator, 
that is, the Laplacian acting on wavefunctions and (ii) 
the calculation of the electrostatic potential, obtained by 
solving the Poisson equation, that is, the Laplacian act- 
ing on the potential. Since the calculation of the Lapla- 
cian (represented as a finite difference) of a function at 
a given grid point uses values of the function at adja- 
cent grid points, imposition of the boundary conditions 
requires knowledge of the function at a few grid points 
outside the boundary. For calculations on infinite crys- 
talline solids, we use periodic boundary conditions (PBC) 
demanding that the function is periodic in space with 
the period of a unit cell that models a physical system. 
Thus, application of the Laplacian at any point in the 
unit cell involves values of the function inside the unit cell 
making implementation of boundary conditions for this 
case straight-forward. For calculations on finite systems 
(atoms, molecules or clusters), we use open boundary 
conditions (OBC). In this case, the treatment of bound- 
ary conditions is more intricate since the grid points ad- 
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FIG. 1: Cell used in a calculation with open boundary condi- 
tions. 

jacent to the ones on the boundary fall both inside and 
outside the region containing the system. In Fig. |l], we 
show how this is treated in HARES. We choose a rect- 
angular box with a spherical region inside, the interior 
of which is large enough to contain the physical system. 
The thickness of the buffer region surrounding this sphere 
depends on the order of the finite-difference Laplacian, 
and is equal to this order times the grid-spacing. Thus, 
the distance between the sphere and the box vanishes in 
the continuum limit and the ratio of the volumes of the 
sphere to the box becomes 7r/6. The wavefunctions and 
charge density vanish outside the spherical region; this 
takes care of the boundary conditions in the kinetic en- 
ergy part. There is one more term that deserves special 
attention in handling OBC: it is the electrostatic poten- 
tial which has long range, and therefore cannot be as- 
sumed to be zero in the buffer region outside the spherical 
region. We obtain values for the electrostatic potential 
in the buffer region through a multipole expansion up to 
order 4, using the density inside the spherical region; we 
use the values at those points as the boundary conditions 
for V H in the solution of the Poisson equation. 



3. Scaling of Computational Effort 

It is useful at this point to analyze the computational 
effort involved in various aspects of a DFT calculation 
using HARES and compare it to that performed using 
the PW method. 

In the PW method, there is one grid in real (direct) 
space and another in reciprocal space. The wavefunctions 
are stored in reciprocal space on part of the grid inside a 
spherical region with diameter equal to half of the side of 
the full grid. These are transformed into real space us- 
ing the fast Fourier transform (FFT) whenever necessary. 
The calculations of the charge density, the local ionic po- 
tential, and the exchange correlation energy are carried 



TABLE I: Scaling of computational effort with system size in 
HARES and a PW method. 



Calculation of 


HARES 


Plane Wave Method 


Charge density 


0(N 2 ) 


0(N 2 log N) 


Kinetic energy 


0(N 2 ) 


0(N 2 ) 


Local potential 


0(N 2 ) 


0(N 2 log AT) 


Nonlocal potential 


0{N Z ) 


0(N' A ) 


Hartree energy 


O(N) 


O(N) 


XC functional 


O(N) 


O(N) 


Orthogonalization 


0(N 3 ) 


0(N 3 ) 



out in real space, whereas the calculation of the kinetic 
energy and the Hartree energy are carried out in recip- 
rocal space. The calculation of the energy related to the 
nonlocal pseudopotential can be done on either grid. For 
small system sizes, the most time-consuming part is often 
that of performing the FFTs, which scales as 0( N log N) . 
The number of FFTs scales linearly with system size giv- 
ing an overall 0(A 2 logA^) scaling for the entire calcu- 
lation. For large system sizes, the orthonormalization of 
wavefunctions, or equivalent constraints imposed during 
minimization of the energy functional, dominate the com- 
putational time. These operations scale as 0(N 3 ). Cal- 
culation of the contribution from the nonlocal pseudopo- 
tentials normally scales as 0(N 3 ), but can in principle be 
improved to 0(N 2 ) scaling by exploiting the short-range 
character of the potentials in real space. 

In HARES, the wavefunctions are stored on the full 
grid in real space and all operations are performed on 
the same grid, eliminating the need for FFTs. The 
calculation of the kinetic energy is carried out using 
finite-difference formulae for derivatives on a grid in real 
space. The Hartree energy and the long-range electro- 
static potential due to periodic charge density are com- 
puted by solving the Poisson equation, which scales as 
O(N). For large system sizes, orthonormalization of the 
wavefunctions dominates the computational time, which 
then scales as 0(N 3 ). The treatment of the nonlocal 
pseudopotentials also scales as 0(N 3 ), unless their short- 
range character is exploited. We summarize the compar- 
ison of scaling between HARES and a PW method in 
Table § 

The current parallel implementation of HARES is in 
high performance Fortran (HPF), which involves sin- 
gle instruction multiple data (SIMD) coding. Since the 
wavefunctions and charge density are stored in real space 
and distributed across processors, communication be- 
tween processors is necessary in calculating: 

(a) the finite-difference derivatives using the CSHIFT 
operation, which cyclically shifts the data in an ar- 
ray on a grid along the specified direction; and 

(b) the inner product of two functions using the the 
SUM operation. 
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TABLE II: Scaling of communication between processors (as- 
sumed to be on a cubic array) used in HARES calculation. 



HPF 


Communication 


Number of 


operation 


per operation 


operations 


CSHIFT 


0(N 2 ^ 3 ) 


O(N) 


SUM 


0(N°) 


0(N 2 ) 



The scaling of inter-processor communication with sys- 
tem size is presented in Table [n| For large enough system 
size, the SUM operations dominate the communications 
in a parallel calculation. 



III. ALGORITHMIC ENHANCEMENTS AND 
IMPLEMENTATION 

A. Non-orthogonal Unit Cell 

Finite-difference formulae for derivatives of functions 
represented on a grid with finite spacing are designed to 
achieve high accuracy and are reasonably accurate for a 
polynomial function up to certain order. These formu- 
lae are typically derived for functions of one variable on 
grids of uniform spacing. Their generalization to higher 
dimensions is trivial through direct product if the grid is 
orthogonal in the various dimensions. For periodic crys- 
tals with non-orthogonal unit cell, it is often not possible 
to design an orthogonal grid with the same periodicity in 
all directions; in this case the implementation of finite- 
difference formulae is not trivial. 

The coordinate transformation employed in HARES 
provides a very simple method to treat non-orthogonal 
unit cells and grids. In this case, the transformation is 
uniform throughout the unit cell and maps an orthogonal 
unit cell in space onto a non-orthogonal unit cell in 
a;— space. If F is a matrix that gives the deformation of 
the orthogonal unit cell into the one under study (i.e. its 
columns are the non-orthogonal unit cell vectors), one 
can always obtain a transformation that is symmetric 
by filtering out the rotational part of F as follows: first 
obtain an auxiliary matrix M defined as M = F ■ F T and 
then diagonalize M to obtain a diagonal matrix D. The 
Jacobian for a rotation-free transformation is then given 
by 

J = T" 1 • L>2 • T, 

where T is a matrix that diagonalizes M: D = T ■ M ■ 
T _1 . Once the mapping onto an orthogonal grid is 
obtained, the derivatives, length and volume elements 
can be obtained using the formalism described in section 

inn. 



B. Preconditioned Conjugate Gradients Solver 

The dominant part of a DFT calculation often con- 
sists of solving an eigenvalue problem, that is, obtaining 
the lowest few (compared with the full spectrum) eigen- 
values and eigenvectors of a very large matrix. In the 
case of PW basis, the size of the matrix is determined 
by the number of PW components included in the basis. 
In the case of HARES, the size of the matrix is deter- 
mined by the number of points that constitute the real 
space grid. For small enough matrices the standard tech- 
niques of linear algebra can be employed, which give the 
exact (within the numerical accuracy of the algorithm) 
eigenvalues and eigenvectors of the matrix. When the 
size of the matrix is large, the conventional methods are 
not practical and the only alternative is to employ iter- 
ative approaches which approximate the eigenvalues and 
eigenvectors in successively improving steps. We consid- 
ered two iterative algorithms for the diagonalization task 
in HARES: 

(a) An Inverse Iteration (II) algorithm with multigrid 
preconditioningtJ; 

(b) A Conjugate Gradient (CG) algorithmil with suit- 
able preconditioning in real space. 

The implementation of the former has been presented 
earlieia and we want to focus on the CG algorithm in 
this subsection. 

In Fig. |[ we present a flowchart of the CG algorithm. 
It is similar to the one in Ref. |2^, presented for a PW 
basis. In real space, most steps in the algorithm remain 
unchanged except for the preconditioning. The main idea 
in preconditioning is to filter out high Fourier compo- 
nents in the wavefunctions and the charge density. We 
achieve this through multiple application of a coarsening 
transformation. For example, the coarsening applied to 
the charge density gives: 

p(k,l,m) -> -p(k,l,m) + 

1 (10) 
— [p(k ± 1, 1, m) + p(k, I ± 1, 77i ) + p(k, l,m± 1)], 

where k,l,m are indices of the grid points at which the 
charge density p is calculated. We find that application of 
this transformation twice on the function under consid- 
eration results in adequate preconditioning. Better pre- 
conditioning is possible in principle for selected cases but 
may not be worth the extra effort required; this method 
provides a preconditioner that works reasonably well in 
all cases we have considered. 

Another aspect of a DFT calculation is that the eigen- 
value problem needs to be solved repeatedly while up- 
dating the charge density to achieve self-consistency be- 
tween the wavefunctions and the corresponding effective 
potential, which depends on the density. Specifically, 
a self-consistent DFT calculation starts with an initial 
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Initialize a wavefunction ip a ( r ) corresponding to e, 



rtho normalize tp a to previous ones, ipg, {3 < a 



Obtain gradient: g a = Hift t 



Precondition g a approximately 



Orthogonalize g a to previous eigenvectors ipg, (3 < a 



Obtain a conjugate gradient (Polak formula) 


i 




Orthogonalize g a to present eigenvector ip a 





Line search along g a for the minimum 
(analytical parametrization) 



Update g a and eigenvectors, eigenvalues 



Has ij) a , e a converged? 




YES 



Go to next eigenvalue e a+ i (if needed) 



[ Wop) 

FIG. 2: Flowchart of the conjugate-gradient algorithm for the 
eigenvalue problem. 



guess for the density and subsequently follows an iter- 
ative procedure of alternating steps of diagonalization 
and improved estimate for the charge density. Each iter- 
ation i starts with a charge density p\ n and obtains the 
part of the eigenspectrum of H KS that corresponds to 
occupied single-particle states and some low-lying unoc- 
cupied states, which depends on p\ n through the effective 
potential. At the end of an iteration the eigenfunctions 
are used to obtain an output charge density p l out . An im- 
proved estimate for the charge density is obtained from 
p\ n and p l out ; this is discussed in detail in the following 
subsection. 

We observe that diagonalization with full convergence, 
that is, with the tolerance for the difference between the 
p\ n and p l out set to be very low, can be computation- 
ally demanding and depends on the initial guess for the 
density. Since the charge density at the initial stage of 
self-consistency is far away from the self-consistent one, 
accurate diagonalization of the Hamiltonian matrix at 
this stage is not worthwhile. Accordingly, we limit the 
number of CG steps for diagonalization at the initial 
stages of the self-consistency loop to relatively few - 
we found that two CG iterations at this stage yield op- 
timal efficiency. Accurate diagonalization is achieved in 
the course of self-consistency as the initial guess for the 
eigensolver is steadily improved. We have found that 
the performance of the II and CG algorithms for conver- 
gence to self-consistency is comparable. The differences 
are rather small and system-dependent. The advantage 



of the CG algorithm is that all operations take place on 
the same grid, whereas in the II algorithm multi-grid 
preconditioning requires the mesh sizes to be a power of 
two. 



C. Charge Density Mixing 

We return now to the way in which the charge density 
is updated at the end of each iteration. In general, the 
new charge density at the end of step i can be constructed 
from the charge densities of previous steps: 

i 

Pnew = ( K T Pin + K< j Ut Pout) = Pit, 1 1 (H) 

j=i-d 

where k/s are mixing coefficients and d is called the 
"depth" of the mixing procedure, i.e. the number of 
previous iterations used in the improved estimate of the 
density. Various mixing schemes are available and have 
been discussed in Ref. [||. 

In our work, we added another feature to mixing: we 
optimize the strength of mixing, that is, the values of the 
parameters Kj , as a function of iteration. This feature can 
be used along with most of the mixing schemes employed 
in the literature. To illustrate the basic idea, we take a 
simple realization of a mixing scheme: 

Pnew = Pin + K {Pout ~ Pin)- (12) 

k = 1 is an extreme case where no knowledge of the input 
density is used, and n = corresponds to the opposite ex- 
treme where the density is not updated at all. For n = 1 , 
there can be oscillations between input and output densi- 
ties corresponding to underdamped mixing. On the other 
hand, for small n the oscillations are overdamped result- 
ing in slow update of the density and hence the approach 
to self-consistency. We have devised a way to achieve 
the optimal critical damping in the mixing procedure. In 
our scheme, we calculate the root mean square change in 
density 5p l at each iteration i: 

6pl= {h J \Pout-pln\ 2 d 3 ry , 

where f2 is the unit cell volume, and define the rate of 
self-consistency as 

_ aiog(V) 
Rl = dt ' 

where t is a fictitious time associated with iterations. R t 
is a rough measure of how well a calculation is evolving 
toward a self-consistent solution. We monitor both the 
rate and the mixing coefficient at each iteration and can 
estimate A = dR/dn. If Ri is too small (< 0.2), the 
approach to self-consistency is too slow. In that case, 
the mixing coefficient is increased or decreased, depend- 
ing on the sign of A, by an amount Are, — a positive 
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A corresponds to smaller k and a negative A to larger k 
compared with the optimal value of this parameter. This 
has the effect of keeping the strength of mixing k near a 
value where R is optimal i.e., in the neighborhood of crit- 
ical damping. The magnitude of change Ak is reduced 
over time to make sure that it converges to its optimal 
value. 



D. Nonlocal Pseudopotential Projector 



The nonlocal part of tke-pseudopotentials Vnl(*, r') is 
used in a separable formEJH to facilitate fast calculation 
of its product with the wavefunction tj;: 



a,l,m 



where a is an atomic index, I and m are angular quan- 
tum indices, and T}f m are constants related to the normal- 
ization of the nonlocal projectors 4>f m . Straightforward 
application of Vjvx(r, r') on the wavefunctions scales as 
0(N 3 ), since there are iV e wavefunctions on N grid 
points, and N a atoms (the number of grid points N is 
proportional to the system size measured by either N a or 
N e , which also scale with each other). This computation 
can be accelerated significantly, by noticing that (r\4>f m ) 
is a localized function centered on atom a. Thus, a cal- 
culation of (4>? m \' t l ; ) m rea l s P ace involves only the grid 
points near atom a making it an 0(N ) computation, 
which gives 0(N 2 ) for the overall calculation involving 
the non-local pseudopotential. We use a filtered pseu- 
dopotential approachLa with a filter that is smooth in 
/c-space (as opposed to the theta-function used in plane- 
wave approaches) . This minimizes the errors in the eval- 



uation of 



,J lm\ 



in addition to using the adaptive grid. 



The parallel implementation of such a calculation is not 
trivial, since it involves only those processors which store 
the grid points near the given atom a. To address this 
issue, we represent the atomic projectors <j>f m (jc) in terms 
of the packed projectors xf m (r) shown schematically as 
follows: 



{4>f m {v)\a=l, 
{xL(r),/?L(r)|j, 



■N} 



(13) 



where X/ m ( r ) is defined as the projector of the j-th largest 
magnitude at a given grid point r (e.g. ^7 m (r)), and 
Pim( r ) IS t ne index of the atom from which X/ m ( r ) was 
generated (in the case of the above example (3f m (r) = 7) . 
We only keep a number of important pseudoprojectors 
Md, (thus letting j vary from 1 to M<j), which we call the 
depth of the packed projectors. We find that Md = 3 is 
typically sufficient; for example, this is exact if the non- 
local projectors of at most three atoms are nonzero at 
any point r. This changes the scaling of memory require- 
ments for the nonlocal potential from 0{N 2 ) to O(N). 



With this choice of packing, the expression for the inner 
product becomes: 



EE'* 



l -J 

(r),a vMn 



|r)(i#>. 



This is readily evaluated using an EXTRINSIC subrou- 
tine call in HPF, which involves execution of the whole 
routine on each processor, but on different data. Effec- 
tively, an inner product (<pf m \tp) with contribution from 
only the grid points inside a sphere centered at atom a 
is calculated by distributing the data with respect to a 
rather than grid points. 



E. Computation of Forces 

Forces on the atoms are calculated using the Hellman- 
Feynman theorem, as is usual in DFT calculations: 



dV cy 



where R a is the position of atom a. For reasons similar 
to those mentioned in the previous subsection, the contri- 
bution of the nonlocal pseudopotential to atomic forces is 
computationally demanding and scales as 0(N 3 ). With 
packed projectors (r|X/ m ), the scaling of this computa- 
tional cost is improved to 0(N 2 ). This is a significant 
improvement for problems involving structural relaxation 
of large systems. 

The implementation of packed projectors in the calcu- 
lation of forces deserves further elaboration. The force 
calculation with nonlocal pseudopotentials involves both 
the projectors 4>f m {v) and their derivatives d(j)f m /dR, a . 
Since the latter is needed only during the calculation of 
forces, it does not need to be packed and stored but can 
be obtained at the time when it is needed. The inner 
products of 4>f m with ipi have to be computed for all 
atoms a at once at the beginning of a force calculation 
since they are packed. With these improvements, we ob- 
tain a factor of 7 speedup in the calculation of forces for 
systems containing about 30 atoms. 

Finally, we should note that as in any method that in- 
volves a computational basis which changes with the po- 
sitions of the atoms, the adaptive grid generates fictitious 
forces referred to as Pulay forces. We have implemented 
the correction related to the Pulay forces and found that 
it is not significant when the grid is refined to the point 
where it yields adequate accuracy. Thus, for all practical 
purposes, with an accurate grid the Pulay correction to 
the forces can be neglected (see also Ref. [|). 

F. Geometry Optimization 

In modern ab initio total energy calculations, one 
of the objectives is to obtain minimum energy geome- 
tries (corresponding to local minima or, if possible, the 
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global minimum of the energy), where the Hcllmann- 
Feynman force on each atom is zero or, more precisely, 
smaller in magnitude than a prescribed value (typically 
< 0.5 mRy/a.u.). The process in which the initial ionic 
geometry is sequentially updated to relax to a neighbor- 
ing local minimum is referred to as the ionic relaxation. 
Mathematically, the ionic relaxation is a nonlinear opti- 
mization problem, which is a subject of vast interest in 
applied mathematics. 

Many optimization algorithms have been proposed so 
far, which can be broadly classified into three groups: 

(i) algorithms which require only the evaluation of the 
function; 

(ii) algorithms requiring the function values and its 
gradients; 

(iii) algorithms requiring the function values and its 
first and second derivatives (the gradients and the 
Hessians) . 

In electronic structure calculations, the function to be 
optimized is the total energy and its gradients are the 
forces on the atoms. Since the forces are not too com- 
putationally demanding compared to the self-consistency 
loop (especially after implementing the improvement dis- 
cussed in the previous subsection), it is natural to use the 
methods of class (ii) for ionic relaxation. 

Among the gradient algorithms, the quasi-Newton 
(also referred tOpas variable metric) method is known to 
be most efficients. The inverse Hessian is approximated 
and updated at each iteration. Suppose \Ro) is an initial 
estimate of the minimizer of the total energy Btotalj Iffo) 
is the corresponding gradient, and Hq is the initial guess 
for the inverse Hessian. At the n-th relaxation step, the 
next approximate minimizer is given by 

\R n+ i) = \R n ) - PnH n \g n ) (14) 

where the step size (3 n is determined by the line searc iE 
(or the line minimization). The inverse Hessian is up- 
dated by 



Hn+i — H n + A„ 



(15) 



where A„ is the correction to H n and is determined by 
requiring that it satisfies the quasi-Newton condition, 



H n+ i\h n ) — \d n ) 



(16) 



with \h n ) = \g n+ i) - \g n ) and \d n ) = \R n +x) - \R n )- 

Among different update formulae for the in- 
verse Hessiantia, we have implemented the initially- 
scaled Bpwdfn-Fletcher-Goldfarb-Shanno (IS-BFGS) 
expressior£§E3: 



H„+i — Hn — 



H n \h n )(d n \ + \d n )(h n \H n 

{dn\K) 

(h n \H n \h n } \ (d n \d n ) 



(d„\h n ) J (d n \h n ) 



(17) 



where H n = ((d n \h n ) / {h n \H n \h n ))H n , when n = and 
H n = H n , otherwise. 

Implemented in combination with the approximate 
line search algorithmEll, the IS-BFGS provides an ef- 
ficient ionic relaxation tool which assures the conver- 
gence of the approximate inverse Hessian to the cor- 
rect inverse Hessian and is numerically stable. As no- 
ticed in other worksHEa, restarting the update sequence 
[Eq. ([IT])] can be beneficiary in some cases, and we 
have used the following restart criterion: (e?n|.<?n+i) < 
0.5-^/ (d n \d n ) (g n +i\9n+i) ■ This implies that the displace- 
ment of an atom is not too different from the direction 
of the calculated force acting on it. 



G. Dual real-space grid calculations 

Typically, the representation of the charge density and 
local potentials in a DFT calculation needs twice as much 
spatial resolution as that of the wavefunctions. To exploit 
this aspect of electronic structure calculations, we have 
developed a version of HARES which employs two sep- 
arate real-space grids — a coarser one and a finer one. 
The wavefunctions are represented on the coarser grid 
and the charge density on the finer one with half the 
grid-spacing of the former. The finer grid corresponds to 
the FFT grid in a PW calculation. This version of the 
code reduces memory requirements substantially. The 
computational cost of the worst scaling part of the cal- 
culation (the wavefunction orthogonalization) is reduced 
by a factor of 8. 

The transformation from the coarser to the finer grid 
is performed only when the charge density needs to be 
calculated from the wavefunctions. We use wavelet inter- 
polantscEl to achieve this. The Poisson equation is solved 
on the finer mesh to obtain the electrostatic potential. 
The exchange correlation potential and the local part 
of the pseudopotential are also calculated on the finer 
grid; these terms are convolutions in fc-space and in a 
PW method are calculated on the FFT grid. The kinetic 
energy operator and the nonlocal pseudopotential act on 
the wavefunctions directly on the coarser grid. Naturally, 
this necessitates usage of a higher order Laplacian in the 
calculation of the kinetic energy, while the one with lower 
order is adequate in the solution of the Poisson equation. 

To check the accuracy of the dual-grid code, we calcu- 
lated the energy difference between two configurations of 
the O2 molecule and compared it with the result of a PW 
calculation. Both methods were used at different energy 
cutoffs, or equivalently, of spatial resolution. We found 
that the energy difference as a function of the energy 
cutoff behaves the same way in both methods. In fact, 
at a low energy cutoff, the sign of the energy difference 
was inverted in both calculations and the magnitude was 
within 6 % of the correct value. 

While the dual-grid approach to real-space electronic 
structure calculation enhances the performance signifi- 
cantly, we caution that the errors introduced by break- 
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TABLE III: Comparison of performance of HARES and a PW 
method. 



System 


CPU time (8 nodes) 
HARES 


CPU time (1 node) 
CASTEP 2.1 


2 

Si24048 


150 sec 
7.5 hours 


1548 sec 
36 hours 



ing of translational invariance, a feature inherent in real- 
space grids (see Ref. |j), are larger than those in the sin- 
gle finer grid calculations. This is due to the coarser 
grid used in the representation of wavefunctions and the 
higher order expression for the Laplacian on the coarser 
grid. Within the PW method, the same error enters in 
calculating the XC potential on the real-space FFT mesh. 
These errors enter into the calculation of forces and can 
be minimized if necessary by using Fourier-filtered pseu- 
dopotentials. We expect the use of the dual-grid ap- 
proach to be very advantageous at the initial stages of 
ionic relaxation of a system with large number of atoms, 
when the accuracy in forces or wavefunctions is not cru- 
cial since the system is presumably far from its optimal 
structure. 



parison discussed here, both methods use conventional 
norm-conserving pseudopotentials. There exists another 
class-of pseudopotentials, called ultra-soft pseudopoten- 
tiala!3, which reduce the size of the Hamiltonian ma- 
trix substantially, making the calculations more efficient. 
Codes that employ this class of pseudopotentials are nat- 
urally faster, whether they use a PW basis or a real-space 
grid. For instance, the VASP code based on the PW for- 
mulation usesjiltra-soft pseudopotentials and has proven 
quite effectiveEj; the commercial version of CASTEP also 
uses these pseudopotentials^ 2 ]. This class of pseudopoten- 
tials has not been yet implemented in HARES. 



IV. APPLICATIONS 

As a test of the accuracy and the efficiency of the algo- 
rithmic improvements discussed above, wc offer a range of 
example applications of HARES. These include represen- 
tative elemental crystals, some molecules, and a couple 
of rather complex materials — blue molybdenum bronze 
and the TON zeolite. All the calculations were performed 
on a Silicon Graphics Origin 2000, using from 2 to 16 pro- 
cessors in parallel mode. 



H. Performance 



In Table 111, we present a comparison of the perfor- 
mance of HARES and that of a PW code, for DFT cal- 
culations on the O2 molecule and a zeolite, Si24048- We 
have selected for the comparison the academic version of 
CASTEPEl, a PW package which uses all the standard 
methods for such calculations and is freely available to 
academic researchers, as is also the case for HARES. We 
believe this provides the most meanigful comparison of 
the performance of the different approaches, for codes at 
equivalent levels of development and availability to the 
academic community. It is clear that the performance of 
HARES for the oxygen molecule is definitely better than 
the PW code. In general, we find that HARES performs 
better for metallic systems. For the zeolite, which is an 
insulator, CASTEP uses a variational method for direct 
minimization of the total energy, which is not applicable 
to the case of metals. This makes the performance of 
CASTEP very good for such systems, though the perfor- 
mance of HARES is not unacceptable (a factor of 1.66 
slower than CASTEP). This advantage of CASTEP over 
HARES is lost when applied to metallic systems, in many 
of which the academic version of CASTEP available to 
us failed to converge in the self-constistency loop. Wc 
have been informed that in commercial versions of this 
package the problematic convergence to self-consistency 
in metallic systemsJias been solved and the performance 
has been improvedc 2 ! We point out that a comparison of 
ab initio packages, is meaningful only for methods that 
employ the same type of pseudopotentials. In the com- 



A. Study of elemental solids 

The simplest test of the method is its application to 
elemental crystalline solids. We have calculated the ba- 
sic structural and electronic properties for representative 
elemental solids, including alkali metals (Li, K), group 
II A metals (Be, Ca), sp-electron metals (Al, Ga), d- 
electron non-magnetic metals (V, Cu, Mo), d-electron 
magnetic metals (Fe, Ni), and semiconductors and insu- 
lators (Si, C). The properties of these solids are extracted 
from total energy calculations for a given crystal struc- 
ture, using the LDA and applying the a posteriori GGA 
corrections. We have used Jiprm-conserving pseudopo- 
tentials from Bachelet et ao for V, and pseudopoten- 
tials generated with the Troullier and Martinala scheme 
for the all the other elements. We perform the calcula- 
tions as follows: we choose a sufficiently dense grid of 
k-points in the Monkhorst-Pack schemecll and fold it to 
the irreducible part of the Brillouin zone by applying the 
symmetry operations of the point group of the crystal 
including inversion which is always a symmetry opera- 
tion in reciprocal space. We make sure the calculation is 
converged with respect to the real space grid spacing in 
the neighborhood of the anticipated equilibrium lattice 
constant, and keep the grid spacing approximately con- 
stant for a range of lattice constants up to about twice 
the equilibrium value. We then fit the resulting energies 
to powers of f^ 2 / 3 , where fl is the volume of the unit 
cell. We thus obtain accurate values of the equilibrium 
lattice constant and minimum total energy. These val- 
ues are used to fit, the two-parameter Universal Binding 
Energy Relational, which has a simple analytical form 
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□ Li (BCC) 
Be (FCC) 

AC (DIA) 
Al (FCC) 
Si (DIA) 
K (BCC) 

+ Ca(FCC) 
V (BCC) 

*Fe(BCC) 
Ni (FCC) 

□ Cu (FCC) 
<>Ga (FCC) 

Mo (BCC) 
N 2 (MOL) 
V0 2 (MOL) 



1 3 5 

Scaled Separation 

FIG. 3: The bulk energies for various element^ solids and the 
N2 and O2 molecules, scaled using the UBERcS. The energies 
are calculated with the a posteriori gradient correction to the 
LDA results. 



from which the bulk modulus and the cohesive energy 
are obtained. This procedure relies on the fact that the 
total energy differences used to fit the Universal Binding 
Energy Relation converge quicker than the total energy 
with respect to the grid parameters. For the nonmagnetic 
materials we use the non-spin-polarized code to perform 
these calculations for reasons of computational efficiency. 
The free atoms in most of the cases we calculated are po- 
larized, the extreme case being Molybdenum were all six 
valence electrons have the same spin. We thus add to 
the cohesive energy as calculated previously the differ- 
ence of a free atom calculation using the spin-polarized 
and spin-average codes. The free atom calculations can 
be converged to the same, high degree of accuracy by use 
of the adaptive grid and thus the relative energies of the 
spin polarized and unpolarized atoms are evaluated on 
an equal footing with the other energy differences. 

In Table IV, we summarize the results of the calcu- 
lations. 



The experimental values are a compilation of 
results from Kittelo; for certain elements we considered 
a simpler lattice than the experimental ground state, and 
in these cases we compare oux. results to the all-electron 
calculations of Moruzzi et a/E2l. Note that, as expected, 
the LDA results for the lattice constant are lower, and 
for the bulk modulus are higher than the experimental 
values. The gradient correction tends to improve the 
situation. Overall, the agreement with experiment is 
quite good, except for the cohesive energies, which is 
a well known deficiency of the approach we are using. 
Looking at the gradient corrected results, other than Ca 
which seems to have large offsets, the lattice constants 
are within 2.0% of the experimental values, bulk mod- 
uli within 22%, and cohesive energies are within 26%. 
In Fig. |^ we present the scaled results of the gradient 



corrected calculations and the corresponding universal 
curve. The magnetic moments of the ferromagnetic ma- 
terials at their equilibrium lattice constant values as cal- 
culated with the LDA (GC) are 2.06 (2.30) /x B /atom for 
Fe and 0.57 (0.59) /le/atom for Ni. These values com- 
pare well with the experimental values of 2.22 /iB/atom 
for Fe and 0.61 ^e/atom for Ni, as well as with the sum- 
mary of the results of various theoretical methods that is 
presented in Ref. |5ll. 



B. Study of small molecules 



In Sec 
conditions 



OB 2 



we described how two types of boundary 
OBC and PBC — can be readily used in 
a HARES calculation. Many DFT methods designed to 
do calculations for solids use PBC and are constrained 
to use a large supercell to study an isolated molecule 
or a cluster of atoms. Here, we present results for four 
molecules N 2 , O2, H 2 0, and NH 3 obtained using HARES 
with the two types of boundary conditions keeping all 
other computational parameters fixed. We use a periodic 
box of dimensions 24 x 18 x 18 a.u. 3 for the N 2 and O2 
molecules and 20 x 20 x 20 a.u. 3 box for the H 2 and 
NH3 molecules, with a grid spacing of 0.25 a.u. 

In Table 0, the various results for small molecules are 
summarized. In ammonia, we also calculated the inver- 
sion barrier of the potential energy surface, by relaxing 
the positions of the hydrogen atoms in the plane for se- 
lected heights of the Nitrogen atom. We find that the 
bond-lengths obtained with OBC tend to be smaller than 
those obtained with PBC, though the difference is quite 
small, in most cases smaller than the accuracy in the re- 
ported results. The energies, on the other hand, have 
significantly larger differences. We suggest that this is 
due to the electrostatic interaction of the field in a su- 
percell calculation with PBC. This interaction between 
the molecule and its periodic images changes the energy 
of the molecule. In the examples considered here a cubic 
cell geometry is used, which results in zero dipole inter- 
action (it can be shown analytically that the interaction 
energy of a dipole with a full shell of dipoles is zero) . This 
indicates that the discrepancy is due to higher-order mul- 
tipole terms. The computational time for the calculations 
with PBC and OBC is similar, so there is no particular 
advantage to either approach from the point of compu- 
tational cost. It appears, however, that for truly isolated 
systems the OBC approach gives more realistic results 
due to the absence of any spurious long range fields. 



Electronic Structure of Blue Bronze, 

K3M010O30 



The material called blue bronze (BB), whose chem- 
ical composition is A0.3M0O3 with A an alkali metal, 
exhibits a variety of interesting physical properties in- 
cluding a metal-to-semiconductor transition at T c — 



12 



TABLE IV: Basic structural and electronic properties of selected elemental crystals. The elements marked by f (dagger) are 
not in the experimentally determined crystal lattice bptj a simpler one; for these elements the numbers in the "Expt." column 
are from the all-electron calculations of Moruzzi et alEi. The elements marked by * (asterisk) are considered in the magnetic 
(spin polarized) ground state. 



Element 


Crystal 




Q (A) 






B (GPa) 






E coh (eV) 








LDA 


CCA 


Expt. 


LDA 


CCA 


Expt. 


LDA 


CCA 


Expt. 


Li 


BCC 


3.39 


3.46 


3.49 


14.4 


13.3 


11.6 


2.12 


1.99 


1.63 


Be f 


FCC 


3.16 


3.19 


3.15 


144 


134 


134 


4.72 


4.42 


3.97 


C 


DIA 


3.53 


3.53 


3.57 


464 


451 


443 


8.89 


8.13 


7.37 


Al 


FCC 


4.12 


4.12 


4.05 


71.0 


70.1 


72.2 


3.41 


3.05 


3.39 


Si 


DIA 


5.47 


5.48 


5.43 


88 


84.9 


98.8 


4.56 


4.08 


4.68 


K 


BCC 


5.17 


5.32 


5.23 


4.7 


3.9 


3.2 


1.05 


0.96 


0.93 


Ca 


FCC 


5.39 


5.48 


5.58 


22.4 


19.1 


15.2 


2.80 


2.73 


1.56 


V 


BCC 


3.01 


3.06 


3.03 


198 


169 


162 


7.32 


5.67 


5.31 


Fe* 


BCC 


2.78 


2.90 


2.87 


201 


144 


168 


6.73 


5.39 


4.28 


Ni* 


FCC 


3.48 


3.59 


3.52 


248 


179 


186 


5.70 


4.43 


4.44 


Cu 


FCC 


3.56 


3.68 


3.61 


192 


138 


137 


4.33 


3.10 


3.49 


Ga f 


FCC 


3.92 


3.92 


4.14 


75.6 


73.7 


44.0 


3.22 


2.89 


3.22 


Mo 


BCC 


3.17 


3.21 


3.15 


279 


246 


273 


8.19 


6.18 


6.82 



TABLE V: Calculated structure and energetics of N2, O2, 
H2O and NH3: effect of boundary conditions. In ammonia the 
energy barrier for the inversion isomerization is given. The 
"Expt." column contains the experimental values, the OBC 
column has our results using the open boundary conditions, 
the PBC column has our results using the periodic boundary 
conditions. 



Molecule 


Property 


Expt. 


OBC 


PBC 


N 2 


Bond-length (A) 


1.10 


1.10 


1.10 




Cohesive Energy(eV) 


9.9 


9.8 


10.0 




Vibration (cm -1 ) 


2359 


2395 


2390 


O2 


Bond-length (A) 


1.21 


1.22 


1.22 




Cohesive Energy(eV) 


5.20 


7.63 


7.79 




Vibration (cm -1 ) 


1580 


1584 


1571 


H 2 


Bond-length (A) 


0.98 


0.97 


0.97 




Bond- angle 


104.6° 


105.7° 


104.4° 


NH 3 


H-H distance (A) 


1.64 


1.67 


1.68 




N-H bond-length (A) 


1.00 


1.02 


1.02 




Energy Barrier (eV) 


0.23 


0.13 


0.10 



180 K, quasi-one-dimensional electronic properties above 
T c , and the existence of incommensurate and commensu- 
rate charge density wave (CDW) phasesEl Recently, a 
family of molybdenum bronzes has been extensively stud- 
ied in experiments using angle-resolved photoemission 
spectroscopy (ARPES) to explore a possible realization 
of non-Fermi-liquid bohavip r due to its low-dimensional 
electronic propertiesoOEJ. To our knowledge, the only 
published electronic band calculation of BB is based on 
a tight-binding (TB) method using some model struc- 
turesES. The dispersion of the TB bands around the 
Fermi level is qualitatively different from the ARPES ex- 



perimental resultsEJ. It is of great importance to have 
an accurate and reliable ab initio calculation of the elec- 
tronic structure of BB in order to interpret the ARPES 
measurement in terms of possibly interesting physics. 
With a large number of atoms in the unit cell, includ- 
ing 10 Mo atoms and 30 O atoms which are typically 
difficult to handle with PW approaches, BB provides a 
challenging system for performing state-of-the-art ab ini- 
tio calculations of the electronic structure; this requires 
a highly efficient computational tool such as HARES. 

The structure and the lattice parameters of BB is 
well documented in Ref. ^7|: the Bravais lattice is cen- 
tered monoclinic (CM), the space group is C2/m, and 
the lattice constants of the simple monoclinic cell are 
a 2 = 16.2311 A, a 3 = 7.5502 A, and ai = 9.8614 A with 
the angle f3 = 94.895° between ai and a.2. The basic 
building block of BB is the M0O6 octahedron; ten oc- 
tahedra form a rigid unit by edge-sharing. Within the 
simple monoclinic (SM) cell, two rigid units are arranged 
so that one of them is located at the apex and the other 
at the center of the cell. As a result, neighboring rigid 
units share a corner oxygen to form a slab spanned by 
a.2 — ai and a3 and four infinitely-connected MoO chains 
(per primitive unit cell) parallel to a3 . The crystal struc- 
ture is illustrated in Fig. |] where the simple monoclinic 
cell is indicated by a box; the two different classes of 
octahedra are indicated by different colors: yellow for 
those that participate actively to conduction along the 
high-conduction direction (a 3 ) and blue for those that 
are apparently inactive. 

We have performed electronic structure calculations 
for BB with HARES. For these calculations we use 
the Ceperley- Alder XCj functional as parametrized by 
Perdew and ZungeioQ. The ions are represented 
by norm-conserving pseudopotentials generated by the 
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FIG. 4: The crystal structure of K3M010O30 depicted with 
MoC>6 octahedra and K ions. The box indicates the simple 
monoclinic unit cell, red balls represent K ions, blue octahedra 
represent the electronically inactive M0O6 units, and yellow 
octahedra represent the active M0O6 units. Upper and lower 
panels correspond to top and front view, respectively. 



Troullier-Martins schema^ iii the fully separable form of 
Kleinman and BylanderEMEl. The Brillouin zone (BZ) 
integrations are performed using a 4 x 4 x 4 Monkhorst- 
Pack k-point meshtO in the BZ of the CM cell. We used a 
grid spacing of h ~ 0.33 a.u. which gives around 200,000 
real-space grid points. 

In Fig. ||, we show isosurfaces of the valence electron 
density obtained from the fully self-consistent calcula- 
tion. A few interesting features can be observed from 
the green isosurfaces which correspond to high electron 
density: 

(i) the high density region has a cylindrical shape with 
its axes overlapping with MoO infinite chains; 

(ii) the electron density is "marginally" connected 
along the [110] crystallographic direction, which is 
within the slab but perpendicular to the chain di- 
rection; 

(iii) the electron density is "barely" connected along the 
slab normal. 




FIG. 5: The isosurfaces of the valence electron density : the 
blue and the green surfaces correspond to a very low and a 
high density, respectively. 
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-20 ■ 



FIG. 6: The ab initio band structure of blue bronze: Two 
bands cross the Fermi level and disperse significantly along 
the chain direction (indicated by gray panel) . Notice the neg- 
ligible dispersion in the other directions. Inset is the Brillouin 
zone of the centered monoclinic cell and the reciprocal lattice 
vectors (61, i>2, and 63) of the simple monoclinic cell are indi- 
cated. 



From these observations, we expect that electronic con- 
duction will be highly anisotropic and the chain direction 
is the most favored. On the other hand, the blue iso- 
surfaces, corresponding to low electron density, indicate 
that the valence electrons are largely depleted around the 
potassium ion sites, suggesting that K atoms play the role 
of donors. 

The full spectrum of the energy bands is shown in 
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FIG. 7: The ab initio bands (red) crossing-the Fermi level 
are compared with the tight-banding bandstJ (green) in the 
neighborhood of the Fermi level. Zq is a k-point on the simple 
monoclinic BZ boundary and equal to 63/2. 

Fig. ||. The lower valence band manifold (30 bands) has 
0(2s) orbital character, the upper valence band mani- 
fold (90 bands) has 0(2p) character mixed with Mo(4<i) 
orbitals near the top of the energy range. There are two 
bands crossing the Fermi level shown in red, well sep- 
arated from both the valence band and the conduction 
band manifolds. These bands disperse primarily along 
the chain direction which is indicated by gray panels in 
Fig. §. 

The two partially filled bands of our ab initio calcula- 
tion are qualitatively different from the TB bandsE^I as 
illustrated in Fig. (jj For instance, the two LDA bands 
cross each other near the BZ boundary whereas the TB 
bands do not. Within the SM BZ (from -zo to zo), the 
LDA bands have occupied bandwidths of 1.3 and 0.3 eV 
while both of the TB bands disperse by 0.3 eV. The oc- 
cupied bandwidth of 1.3 eV for the lopdying band is in 
good agreement with the APRES dataa. These qualita- 
tive differences in the ab initio and TB bands originate 
from the correct and unbiased description of the interac- 
tions and the use of a realistic atomic structure in our 
calculations. Our analysis of the wavefunction character 
at the T-point shows anisotropic hybridization between 
the Mo d-states and the O p-states. It would be rather 
difficult to describe this situation within the context of 
the TB approach. 

The significant dispersion of the partially filled bands 
only along the chain direction results in planar Fermi 
surfaces, which are nested along the chain direction. In 
turn, the nested Fermi surfaces induce a CDW. The 
estimated CDW wave vector is ~ 0.75 b 3 compared with 
the observed one in the range of 0.72 - 0.75 b3. A more 
detailed analysis ofj-the physics of this material will be 
presented elsewhereE3. 

In summary, our application of HARES to the elec- 



FIG. 8: Framework in the structure of TON-zeolite, possess- 
ing three types of cavities. O atoms occupy the positions at 
the vertices of tetrahedra and Si (Al) atoms are at the center 
of the red (green) tetrahedra. The Na atom (blue) remains 
inside the cavity in the vicinity of the Al-occupied tetrahe- 
dron. The rectangle in the center denotes the projection of 
the orthorhombic unit cell perpendicular to the c-axis. In the 
bottom left corner we label the three different types of pores. 

tronic structure of BB suggests that an accurate and re- 
liable method with a realistic atomic structure is needed 
in order to investigate the behavior of such complex ma- 
terials. The ab initio energy bands are in good agree- 
ment with the ARPES measurement and the nature of 
the electronic states relevant to conduction can thus be 
elucidated. 



D. Zeolite: Na„Al o 48 

The word "zeolite" (of Greek origin) means "boiling 
stone" and derives from the visible loss of water when 
natural zeolite minerals are heated. Zeolites are materi- 
als with unique properties which make them useful in 
a variety of applications such as oil cracking, nuclear 
waste management, catalysis and animal feed supple- 
ments. They form a well-defined class of naturally occur- 
ring crystalline alumino-silicate minerals. They have el- 
egant three-dimensional structures arising from a frame- 
work of [SiC>4] 4 ~ and [A1C>4] 5 ~ coordination tetrahedra 
linked at their corners. The frameworks are generally 
very open and contain cavities that enclose cations and 
water molecules. The presence of cavities make zeolites 
porous and gives rise to their low density and unique 
properties. Since the cations, water or other molecules 
that can be contained in these cavities, interact weakly 
with the cavity walls, these entities have high mobility in 
the solid zeolite. As a result a number of interesting phys- 
ical and chemical properties arise: facile ion exchange, 
easy water loss upon heating, molecular sieve behavior, 
etc. 
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In this Section, we study, a zeolite which is referred to 
by the code name "TON"Ea and has the general chemical 
formula Na„Al„ Si24- n 04s; we will consider the struc- 
tures for n = and n — 1. The framework of tetrahedra 
in its crystal structure is displayed in Fig. ||[ TON has an 
orthorhombic crystal structure with three different types 
of pores or channels parallel to the c-axis. For n = 0, 
its unit cell has 24 formula units of SiC>2 with a volume 
of about 1320 A 3 . Its space group is Cmc2i and four 
of the 24 formula units are symmetry- independent. For 
n = 1 (called Theta-1), the structure-has been deter- 
mined from X-ray powder experimentsEa. Theta-1 is the 
first reported unidimensional medium-pore high-silica ze- 
olite. 

Starting with the experimental geometryElJ, we relaxed 
the atomic structure of Si2404g using HARES. All the 
bond lengths obtained from the calculation are within 
2% of the experimental values. We next considered four 
independent Si sites where an Al atom can be substituted 
for a Si atom to obtain AlSi23 48, and relaxed its atomic 
structure. Interestingly enough, we find that all four pos- 
sible structures have very similar energies. Within the 
accuracy of our calculations the four sites cannot be dif- 
ferentiated. 

Addition of a sodium atom to the AlSi23 04g structure 
introduces a variety of structural possibilities. We ex- 
plored three possible structures, based on the three types 
of cavities in which the Na atom can be placed. In Fig. 0, 
we show a unit cell of a structure with Na added in the 
largest of the three cavities. Since AlSi2304g is missing 
one electron due to the substitution of a Si atom by an 
Al atom, the added Na atom will naturally prefer to stay 
in the vicinity of the Al atom to which is can donate its 
valence electron. In the relaxed structure, the Na atom 
sits closest to three O atoms that are bonded to the Al 
atom and as a result the Al-0 bonds are elongated. 

In Fig. ||, we also show an isosurface of the electron 
density for NaAlSi23C>48 and compare it with that of 
Si24C>48. The two systems have very similar charge dis- 
tribution except in the region localized near the Na and 
Al atoms. The isosurfaces clearly have the same topol- 
ogy as the geometrical structure in Fig. ^: O atoms are 
at the center of the bulging regions of the isosurface and 
Si atoms at the joints or vertices. This indicates that the 
bonding is primarily ionic, with negatively charged O and 
positively charged Si atoms. Partial covalent character is 
also evident from the fully connected isosurface. 

In NaAlSi23C>48, the valence electron donated by Na 
compensates for the one missing electron in the four 
bonds formed by the Al atom. The nature of bonding of 
Al with O atoms on the opposite side of the Na atom is 
clearly different from that with the three O atoms on the 
Na side. The latter is very similar to the bonding char- 
acter between Si and O in Si24C>48. The charge on both 
Al and Na is positive, which has the effect of displacing 
the Al atom slightly away from the Na atom resulting in 
longer Al-0 bonds. This introduces small structural dis- 
tortions and changes in charge distribution in the neigh- 




FIG. 9: From left to right, top to bottom, we show the space- 
fill structure model and the charge density isosurfaces of the 
Si24 048 and the NaAlSi23 048 structures. In the structure 
models the Oxygen atoms are red, the Silicon atoms are yel- 
low. In the NaAlSi23 048 charge density plot blue corresponds 
to Na, and red to Al. 



boring Si04 tetrahedra. Since the addition of Na results 
in compensating electrostatic and covalent interactions, 
we expect that the energy barrier in the process of at- 
tachment of Na (or in general a cation) to the walls of 
cavities in this zeolite should be very small. Further in- 
vestigation of the chemical activity inside these pores and 
its effect in the electronic stmcture of the zeolite will be 
the subject of future studieslEl. 



V. SUMMARY 

In this paper, we provided a comprehensive review of 
the theory underlying HARES, which is a method for 
ab initio electronic structure calculations implementated 
using HPF on a shared memory parallel computer archi- 
tecture. Several applications of the method to calculate 
the properties of simple and complex physical systems 
were presented to illustrate its capabilities. We obtained 
the bulk features of elemental solids such as equilibrium 
lattice constant, bulk modulus and cohesive energy, for 
elements from many different columns of the Periodic 
Table, and find good agreement with experiment within 
the limitations of DFT/LDA calculations. For the small 
molecules N2, O2, H2O, and NH3, we find that the struc- 
tural features do not depend on boundary conditions 
(open or periodic) used in the calculation, while the en- 
ergy is sensitive to the the choice of boundary conditions. 
Application of the method to blue molybdenum bronze 
and a zeolite demonstrate that it can be used effectively 
to study complex material systems. In the case of Blue 
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Bronze the results help to clarify important issues of the 
electronic structure pertaining to recent experiments. In 
our study of the TON zeolite S124O48 and its variations 
containing Al and Na atoms, we demonstrated the ability 
of the method to capture the nature of bonding between 
a cation and the walls of cavities in the zeolite; such inter- 
actions arc related to the mobility of ions and molecules 
inside pores of the zeolite framework and should give rise 
to interesting physical and chemical behavior. 



search through the Common High-performance Scientific 
Software Initiative (CHSSI) and the High Performance 
Computation Modernization Office. This set of codes is 
available upon requestEj. The subsequent development of 
HARES was funded by Ryoka Systems Inc. The authors 
wish to acknowledge useful discussions and collaborations 
with Melvin Chen and Greg Smith, and useful comments 
from Nick Choly, Ioannis Remediakis, Jose Soler and G.- 
H. Gweon. 
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